feat(server): New TTL system, enforce max queue length limits, lazy waitpoint creation by ericallam · Pull Request #2980 · triggerdotdev/trigger.dev

ericallam · 2026-01-30T19:02:06Z

This PR implements a new run TTL system and queue size limits to prevent unbounded queue growth which should help prevent situations where queues enter a "death spiral" where the queue will never be able to catch up.

The main/correct way to battle this situation is to enforce a maximum TTL on all runs (e.g. up to 14 days) where runs that have been queued for that maximum TTL will get auto-expired, making room for newer runs to execute. This required creating a new TTL system that can handle higher workloads and is now deeply integrated into the RunQueue. When runs are enqueued with a TTL, they are added to their normal queue as well as to the TTL queue. When runs are dequeued, they are removed from both their normal queue and the TTL queue. If runs are dequeued by the TTL system, they are removed from their normal queue. Both these dequeues happen automatically so there is no race condition.

The TTL expiration system is also made reliable by expiring runs via a Redis worker, which is enqueued to atomically inside the TTL dequeue lua script.

Optional associated waitpoints

Additionally, this PR implements an optimization where runs that aren't triggered with a dependent parent run will no longer create an associated waitpoint. Associated waitpoints are then lazily created if a dependent run wants to wait for the child run post-facto (via debounce or idempotency), which is a rare situation but is possible. This means fewer waitpoint creations but also fewer waitpoint completions for runs with no dependencies.

Environment Queue Limits

Prevents any single queue growing too large by enforcing queue size limits at trigger time.

Queue size checks happen at trigger time - runs are rejected if queue would exceed limit
Dashboard UI shows queue limits on both the Queues page and a new Limits page
In-memory caching for queue size checks to reduce Redis load

Batch trigger fixes

Currently when a batch item cannot be created for whatever reason (e.g. queue limits) the run will never get created, which means a stalled run if using batchTriggerAndWait. We've updated the system to handle this differently: now when a batch item cannot be triggered and converted into a run, we will eventually (after retrying 8 times up to 30s) we will create a "pre-failed" run with the error details, correctly resolving the batchTriggerAndWait.

changeset-bot · 2026-01-30T19:02:16Z

⚠️ No Changeset found

Latest commit: 9d90dd8

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-01-30T19:02:26Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Centralizes queue-size logic (new v3/queueLimits utility and environment queueSizeLimit exposure) and adds an LRU cache for environment queue lengths. Refactors queue validation to per-queue semantics (resolveQueueNamesForBatchItems, validateMultipleQueueLimits) and surfaces itemsSkipped/runCount through batch streaming APIs. Introduces per-item retry for batch queue processing, batch-run-count updates, and a TriggerFailedTaskService for creating pre-failed runs. Adds a TTL expiration subsystem (batched TTL consumers, Redis TTL scripts, ttlSystem callback) and lazy get-or-create waitpoints with related waitpoint APIs. Numerous RunEngine/RunQueue/BatchQueue public API additions and tests updated; UI presenters and routes updated to use the single queueSize quota.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~180 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title accurately summarizes the main changes: new TTL system, queue size limit enforcement, and lazy waitpoint creation.
Description check	✅ Passed	PR description provides comprehensive context on TTL system, queue size limits, lazy waitpoints, and batch trigger improvements with rationale and objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ea-branch-117

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vibe-kanban-cloud · 2026-01-30T19:06:05Z

Review Complete

Your review story is ready!

View Story

Comment !reviewfast on this PR to re-generate the story.

…d limits page

…env queue size check

…new ttl system

…ndles failures from queue length limit failures and also retries

… var, set at engine level

…properly cleaned up from queues and queues rebalanced after getting expired by ttl system

…not parse correctly

…e a default value of 500

…from the ttl system

… re-enqueues

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

…xisting run is already complete

apps/webapp/app/runEngine/concerns/queues.server.ts

apps/webapp/app/runEngine/services/batchTrigger.server.ts

packages/core/src/v3/schemas/api.ts

packages/trigger-sdk/src/v3/shared.ts

queue-metrics-design.md

…ger service

…we are doing failed runs

…to batch expire runs

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

This comment was marked as resolved.

Sign in to view

ericallam force-pushed the ea-branch-117 branch from a50a5f5 to aaea8d6 Compare February 5, 2026 17:32

This comment was marked as resolved.

Sign in to view

ericallam changed the title ~~feat(dashboard): Display environment queue length limits on queues and limits page~~ feat(server): New TTL system, enforce max queue length limits, lazy waitpoint creation Feb 5, 2026

ericallam force-pushed the ea-branch-117 branch from aaea8d6 to 21dae6f Compare February 9, 2026 15:38

This comment was marked as resolved.

Sign in to view

ericallam added 13 commits February 11, 2026 14:37

feat(dashboard): Display environment queue length limits on queues an…

0326ff1

…d limits page

Make it clear the limit is across all queues in the env

be8fa57

A couple of devin improvements and adding an in memory cache for the …

4918f83

…env queue size check

Add queue length limits at the queue level, lazy waitpoint creation, …

9aac1c4

…new ttl system

Failed batch queue processing now creates a pre-failed run, better ha…

e93dbba

…ndles failures from queue length limit failures and also retries

introduce maximum ttl via the RUN_ENGINE_DEFAULT_MAX_TTL optional env…

45b6cdb

… var, set at engine level

fix webapp typecheck issue

28d955c

improve efficiency of expiring runs in batch, and make sure runs are …

6004591

…properly cleaned up from queues and queues rebalanced after getting expired by ttl system

Make sure maxTtl is enforced even when the ttl option passed in does …

5eee2df

…not parse correctly

Create a more reliable ttl expiration system using atomic redis worker

17aabd4

queue size limits are upgradable and don't make the max dev queue hav…

037826b

…e a default value of 500

correctly clear runs from env current concurrency sets when dequeued …

441386f

…from the ttl system

Only engage the ttl system when a run is first enqueued, no longer on…

d424754

… re-enqueues

ericallam force-pushed the ea-branch-117 branch from 94e138b to d424754 Compare February 11, 2026 14:38

ericallam marked this pull request as ready for review February 11, 2026 14:50

devin-ai-integration bot reviewed Feb 11, 2026

View reviewed changes

Ensure lazy waitpoints are always created and completed even if the e…

7c143dd

…xisting run is already complete